Stance detection models may tend to rely on dataset bias in the text part as a shortcut and thus fail to sufficiently learn the interaction between the targets and texts. Recent debiasing methods usually treated features learned by small models or big models at earlier steps as bias features and proposed to exclude the branch learning those bias features during inference. However, most of these methods fail to disentangle the ``good'' stance features and ``bad'' bias features in the text part. In this paper, we investigate how to mitigate dataset bias in stance detection. Motivated by causal effects, we leverage a novel counterfactual inference framework, which enables us to capture the dataset bias in the text part as the direct causal effect of the text on stances and reduce the dataset bias in the text part by subtracting the direct text effect from the total causal effect. We novelly model bias features as features that correlate with the stance labels but fail on intermediate stance reasoning subtasks and propose an adversarial bias learning module to model the bias more accurately. To verify whether our model could better model the interaction between texts and targets, we test our model on recently proposed test sets to evaluate the understanding of the task from various aspects. Experiments demonstrate that our proposed method (1) could better model the bias features, and (2) outperforms existing debiasing baselines on both the original dataset and most of the newly constructed test sets.
translated by 谷歌翻译
异构综合学习粒子群优化(HCLPSO)是一种具有增强探索和开发能力的进化算法。与随机序列相比,覆盖搜索空间的低阶段序列(LDS)在覆盖搜索空间方面更均匀。在本文中,研究了利用LDS的良好均匀性来改善HCLPSO。进行数值实验以表明仅通过使用LDS生成初始种群,就不可能有效地提高HCLPSO的搜索能力。但是,如果我们从HCLPSO速度更新公式中正确选择一些随机序列并将其替换为确定性LDS,则可以获得更有效的算法。与原始的HCLPSO在相同的精度要求下相比,使用确定性LDS更新速度的HCLPSO可以显着降低找到最佳解决方案所需的迭代,而不会降低成功率。
translated by 谷歌翻译
在许多现实世界中的机器学习应用中,亚种群的转移存在着极大地存在,指的是包含相同亚种群组的培训和测试分布,但在亚种群频率中有所不同。重要性重新加权是通过对训练数据集中每个样本施加恒定或自适应抽样权重来处理亚种群转移问题的正常方法。但是,最近的一些研究已经认识到,这些方法中的大多数无法改善性能,而不是经验风险最小化,尤其是当应用于过度参数化的神经网络时。在这项工作中,我们提出了一个简单而实用的框架,称为“不确定性感知混合”(UMIX),以根据样品不确定性重新加权“混合”样品来减轻过度参数化模型中的过度拟合问题。基于训练 - 注射器的不确定性估计为每个样品的拟议UMIX配备,以灵活地表征亚群分布。我们还提供有见地的理论分析,以验证UMIX是否在先前的工作中实现了更好的概括界限。此外,我们在广泛的任务上进行了广泛的经验研究,以验证我们方法的有效性,既有定性和定量。
translated by 谷歌翻译
由于方面级别的情感标签是昂贵且富有劳动力的,因此提出了零击方面的情感分类,以学习适用于新域的分类器,而无需使用任何带注释的方面级别数据。相比之下,更容易访问具有评分的文档级别的情感数据。在这项工作中,我们仅使用文档级评论来实现零击方面的情感分类。我们的关键直觉是,文档的情感表示由该文档的所有方面的情感表示组成。基于此,我们提出了AF-DSC方法,以在评论中明确建模此类情感组成。 AF-DSC首先学习所有潜在方面的情感表示形式,然后将方面级别的情感汇总到文档级的情感上,以执行文档级别的情感分类。通过这种方式,我们将其作为文档级别分类器的副产品获得方面级别的分类器。方面情感分类基准的实验结果证明了在文档级别分类中明确利用情感组成的有效性。我们的模型只有30k培训数据的表现优于先前的工作,利用数百万个数据。
translated by 谷歌翻译
最近,先驱研究工作提出了大量的声学特征(原木功率谱图,线性频率卷轴系数,恒定的q cepstral系数等),以进行音频深层检测,获得良好的性能,并表明不同的子带对音频有不同的贡献DeepFake检测。但是,这缺乏对子带中特定信息的解释,这些功能也丢失了诸如阶段之类的信息。受合成语音机制的启发,基本频率(F0)信息用于提高综合语音的质量,而合成语音的F0仍然太平均,这与真实语音的F0差异很大。可以预期,F0可以用作重要信息来区分真正的语言和虚假语音,而由于F0的分布不规则,因此不能直接使用此信息。相反,选择了大多数F0的频带作为输入特征。同时,为了充分利用相位和全频段信息,我们还建议使用真实和虚构的频谱图作为互补输入功能,并分别对Discoint子带进行建模。最后,融合了F0的结果,真实和假想的频谱图。 ASVSPOOF 2019 LA数据集的实验结果表明,我们所提出的系统对于音频DeepFake检测任务非常有效,达到等效错误率(EER)为0.43%,几乎超过了所有系统。
translated by 谷歌翻译
整个幻灯片图像(WSI)分类通常依赖于深度监督的多个实例学习(MIL)方法来处理Gigapixel分辨率图像和幻灯片级标签。然而,深度学习的不错的表现来自利用大量数据集和不同的样本,敦促需要有效的培训管道来扩展到大型数据集和数据增强技术以进行多元化样品。但是,当前基于MIL的WSI分类管道是内存量的且计算的,因为它们通常组装成千上万的补丁作为计算袋。另一方面,尽管它们在其他任务中很受欢迎,但对于WSI MIL Frameworks来说,数据增强尚未探索。为了解决它们,我们提出了Remix,这是基于MIL WSI分类的一般有效框架。它包括两个步骤:减少和混合。首先,它通过用实例原型(即贴片群质心)代替实例,从而减少了WSI袋中的实例数量。然后,我们提出了一个``混合式''增强,其中包含四个在线,随机和灵活的潜在空间扩展。它带来了潜在空间的多样化和可靠的班级身份的语义变化,同时实施语义扰动不变性。我们通过两种最先进的MIL方法在两个公共数据集上评估混音。在我们的实验中,已经实现了精确度,准确性和召回率的一致提高,但随着训练时间和记忆消耗的减少阶段,它表明了混音的有效性和效率。代码可用。
translated by 谷歌翻译
文档级关系提取(DRE)旨在识别两个实体之间的关系。实体可以对应于超越句子边界的多个提升。以前很少有研究已经调查了提及集成,这可能是有问题的,因为库鲁弗提到对特定关系没有同样有贡献。此外,事先努力主要关注实体级的推理,而不是捕获实体对之间的全局相互作用。在本文中,我们提出了两种新颖的技术,上下文指导的集成和交互推理(CGM2IR),以改善DRE。而不是简单地应用平均池,而是利用上下文来指导在加权和方式中的经验提升的集成。另外,对实体对图的相互作用推理在实体对图上执行迭代算法,以模拟关系的相互依赖性。我们在三个广泛使用的基准数据集中评估我们的CGM2IR模型,即Docred,CDR和GDA。实验结果表明,我们的模型优于以前的最先进的模型。
translated by 谷歌翻译
主成分分析(PCA)是一种用于矢量数据的流行尺寸减少技术。因子PCA(FPCA)是PCA的PCA用于矩阵数据的概率扩展,这可以大大降低PCA中的参数数,同时产生令人满意的性能。然而,FPCA基于高斯假设,从而易于异常值。虽然将多元$ T $分布作为矢量数据的强大建模工具具有很长的历史,但其对矩阵数据的应用非常有限。主要原因是矢量化矩阵数据的维度通常非常高,尺寸越高,测量稳健性的击穿点越低。为了解决FPCA遭受的稳健性问题,并使其适用于矩阵数据,本文提出了一种强大的FPCA(RFPCA)的扩展,这是一个被称为矩阵 - 变化$ T $分布的$ T $ -Type分布。与多元$ T $分布一样,Matrix-Variate $ T $分布可以自适应地降价异常值并屈服于强大的估计。我们开发了一种用于参数估计的快速EM型算法。综合性和现实世界数据集的实验表明,RFPCA比较有利地与若干相关方法,RFPCA是一个简单但有力的矩阵值异常检测工具。
translated by 谷歌翻译
Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech during the text-based speech editing to make the generated speech more expressive. To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech. Firstly, we propose an end-to-end emotion-selectable text-based speech editing model. The key idea of the model is to control the emotion of generated speech by introducing additional emotion attributes based on the context-aware mask prediction network. Secondly, to prevent the emotion of the generated speech from being interfered by the emotional components in the original speech, a neutral content generator is proposed to remove the emotion from the original speech, which is optimized by the generative adversarial framework. Thirdly, two data augmentation methods are proposed to enrich the emotional and pronunciation information in the training set, which can enable the model to edit the unseen speaker's speech. The experimental results that 1) Emo-CampNet can effectively control the emotion of the generated speech in the process of text-based speech editing; And can edit unseen speakers' speech. 2) Detailed ablation experiments further prove the effectiveness of emotional selectivity and data augmentation methods. The demo page is available at https://hairuo55.github.io/Emo-CampNet/
translated by 谷歌翻译
Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e.g., zero-shot classification, retrieval and image captioning. However, their successes highly rely on the scale and quality of web-crawled data that naturally contain incomplete and noisy information (e.g., wrong or irrelevant content). Existing works either design manual rules to clean data or generate pseudo-targets as auxiliary signals for reducing noise impact, which do not explicitly tackle both the incorrect and incomplete challenges simultaneously. In this paper, to automatically mitigate the impact of noise by solely mining over existing data, we propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. First, in noise-harmonization scheme, NLIP estimates the noise probability of each pair according to the memorization effect of cross-modal transformers, then adopts noise-adaptive regularization to harmonize the cross-modal alignments with varying degrees. Second, in noise-completion scheme, to enrich the missing object information of text, NLIP injects a concept-conditioned cross-modal decoder to obtain semantic-consistent synthetic captions to complete noisy ones, which uses the retrieved visual concepts (i.e., objects' names) for the corresponding image to guide captioning generation. By collaboratively optimizing noise-harmonization and noise-completion schemes, our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way. Extensive experiments show the significant performance improvements of our NLIP using only 26M data over existing pre-trained models (e.g., CLIP, FILIP and BLIP) on 12 zero-shot classification datasets, MSCOCO image captioning and zero-shot image-text retrieval tasks.
translated by 谷歌翻译